1+1 


Defence  Research  and  Recherche  et  developpement 
Development  Canada  pour  la  defense  Canada 


DEFENCE 


A  simple  expression  for  the  matrix 
gradient  of  a  diagonal  element  of 

R  in  QR  decomposition 

for  use  in  MIMO  Communications 
and  Signal  Processing 

A.  Yasotharan 


Defence  R&D  Canada  -  Ottawa 

Technical  Memorandum 
DRDC  Ottawa  TM  2010-247 
December  2010 


Canada 


A  simple  expression  for  the  matrix  gradient  of 
a  diagonal  element  of  R  in  QR  decomposition 

for  use  in  MIMO  Communications  and  Signal  Processing 
A.  Yasotharan 


Defence  R&D  Canada  -  Ottawa 

Technical  Memorandum 
DRDC  Ottawa  TM  2010-247 
December  2010 


Author 


Original  signed  by  A.  Yasotliaran 
A.  Yasotharan 

Approved  by 

Original  signed  by  Bill  Katsube 
Bill  Katsube 

Head/Communications  and  Navigation  Electronic  Warfare 

Approved  for  release  by 

Original  signed  by  Chris  McMillan 

Chris  McMillan 
Head/Document  Review  Panel 


©  Her  Majesty  the  Queen  in  Right  of  Canada  as  represented  by  the  Minister  of  National 
Defence,  2010 

©  Sa  Majeste  la  Reine  (en  droit  du  Canada),  telle  que  representee  par  le  ministre  de  la 
Defense  nationale,  2010 


Abstract 


The  QR  matrix  decomposition  (QRD),  or  factorization,  has  many  applications.  In  matrix 
computations,  it  is  used  to  solve  linear  equations  and  least  squares  problems.  In  signal 
processing,  it  is  used  for  adaptive  filtering,  adaptive  beamforming/  interference  nulling,  and 
direction  finding.  In  communications,  it  is  used  for  adaptive  equalization  and  transceiver 
design  for  multiple-input  multiple-output  (MIMO)  channels. 

When  QRD  is  used  in  signal  processing  and  communications,  it  is  of  interest  to  know  the 
effects  of  noise.  This  work  is  a  first  step  towards  that  goal. 

Given  matrix  A  with  full  column-rank  M,  we  consider  the  unique  decomposition  A  =  QR 
where  Q  is  a  matrix  with  M  orthonormal  columns  and  R  is  an  M  x  M  upper  triangular 
matrix  with  real  positive  diagonal  elements  fi,  ?2,  ■  ■  ■ ,  ?m-  Treating  f,  as  a  function  of  the 
elements  of  A,  a  simple  expression  is  derived  for  its  matrix  gradient  with  respect  to  A. 

Future  work  will  aim  to  derive  expressions  for  the  gradients  of  all  elements  of  Q  and  R, 
and  use  these  expressions  to  evaluate  the  effect  of  noise  perturbation  in  A.  The  present 
result  is  useful  in  optimizing  certain  MIMO  decision-feedback  communication  systems. 


Resume 


La  decomposition  QR  (QRD)  d’une  matrice,  ou  sa  factorisation,  a  plusieurs  applications. 
Dans  les  calculs  matriciels,  elle  permet  de  resoudre  des  equations  lineaires  et  des  problemes 
des  moindres  carres.  Dans  le  traitement  de  signaux,  elle  sert  au  filtrage  adaptatif,  a  la  mise 
en  forme  adaptative  de  faisceaux,  a  la  suppression  de  brouillage  et  a  la  radiogoniometrie. 
Dans  les  communications,  elle  sert  a  legalisation  adaptative  et  a  la  conception  d’emetteurs 
recepteurs  avec  canaux  MIMO  (entrees  multiples,  sorties  multiples). 

Lorsque  la  decomposition  QR  est  utilisee  dans  le  traitement  de  signaux  et  les  communica¬ 
tions,  il  est  important  de  connaitre  les  effets  du  bruit.  Les  presents  travaux  constituent  la 
premiere  etape  vers  la  realisation  de  cet  objectif. 

Soit  une  matrice  A  de  rang-colonne  complet  M,  nous  considerons  la  decomposition  unique 
A  =  QR  ou  Q  est  une  matrice  M  a  colonnes  orthonormales  et  R  est  une  matrice  tri- 
angulaire  superieure  M  x  M  dont  les  elements  diagonaux  sont  des  nombres  reels  positifs 
iq, r2, . . . , I'm ■  Considerant  r,-  comme  une  fonction  des  elements  A,  une  expression  simple 
est  derivee  de  son  gradient  matriciel  par  rapport  a  A. 

Les  recherches  futures  auront  comme  objectif  de  deriver  des  expressions  pour  les  gradients 
des  elements  de  Q  et  de  R,  et  d’utiliser  ces  expressions  pour  evaluer  les  effets  de  la  pertur¬ 
bation  sur  le  bruit  dans  la  matrice  A.  Les  resultats  de  la  presente  etude  seront  utiles  dans 
T optimisation  de  certains  systemes  de  communication  a  decision  retroactive  MIMO. 
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Executive  summary 


A  simple  expression  for  the  matrix  gradient  of  a 
diagonal  element  of  R  in  QR  decomposition 

A.  Yasotharan;  DRDC  Ottawa  TM  2010-247;  Defence  R&D  Canada  -  Ottawa; 
December  2010. 

Background:  In  linear  algebra,  it  is  well  known  that  any  matrix  A  can  be  decomposed, 
or  factorized,  as  A  =  QR  where  Q  is  a  matrix  with  orthonormal  columns  and  R  is  an 
upper  triangular  matrix.  This  so-called  QR  decomposition  (QRD)  has  many  applications. 
In  matrix  computations,  it  is  used  to  solve  linear  equations  and  least  squares  problems. 
In  signal  processing,  it  is  used  for  adaptive  filtering,  adaptive  beamforming/  interference 
nulling,  and  direction  finding.  In  communications,  it  is  used  for  adaptive  equalization  and 
transceiver  design  for  multiple-input  multiple-output  (MIMO)  channels. 

In  signal  processing  and  communications,  it  is  of  interest  to  know  how  noise  perturbation 
of  A  affects  the  elements  of  Q  and  R.  This  work  is  a  first  step  towards  that  goal. 

Principal  results:  In  general  the  QRD  is  not  unique.  When  A  has  full  column-rank  M, 
uniqueness  can  be  ensured  by  choosing  the  matrix  Q  to  have  M  columns  and  constraining 
the  diagonal  elements  of  R,  say  r\ ,  h,  ■  ■  ■ ,  ?m,  to  be  positive.  We  consider  this  case. 

Treating  r,  as  a  function  of  the  elements  of  A,  a  simple  expression  is  derived  for  its  matrix 
gradient  with  respect  to  A.  Let  A  =  X  +  fY  be  the  real-imaginary  decomposition  of 
A.  Let  q,  and  s7  be  the  columns  of  Q  and  S  =  R  1  respectively.  It  is  shown  that 
+  ,/gy  q  =  r,q,sf ,  which  is  a  rank-one  matrix. 

Matrix  gradients  of  functions  of  {r,}  of  the  form  f(n)  and  f{r\ .  rj, ... ,  tm)  arc  obtained. 

Significance  of  results:  When  A  is  a  signal-plus-noise  matrix  whose  QRD  is  sought,  it  is 
of  interest  to  know  how  noise  affects  Q  and  R.  Here,  the  gradients  of  elements  of  Q  and 
R  with  respect  to  A  will  be  of  value,  as  seen  in  stochastic  perturbation  theory.  This  work 
constitutes  a  first  step  in  that  direction. 

In  MIMO  decision-feedback  communication  system  design,  the  QRD  of  the  transmitter- 
channel  composite  matrix  yields  the  optimum  receiver  and  its  performance.  Specifically, 
{r?}  are  the  signal-to-noise  ratios  for  the  parallel  data  streams.  Thus  the  derived  expres¬ 
sions  for  should  be  useful  in  jointly  optimizing  the  transmitter  and  the  receiver. 
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Future  work:  Expressions  must  be  derived  for  the  gradients  of  all  elements  of  Q  and 
R,  as  these  expressions  will  be  useful,  according  to  the  stochastic  perturbation  theory,  in 
evaluating  the  effects  of  noise  in  the  QRD  of  a  signal-plus-noise  matrix. 

It  would  be  worthwhile  to  demonstrate  that  the  derived  expressions  for  are  useful  in 
MIMO  system  design. 


IV 
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Sommaire 


A  simple  expression  for  the  matrix  gradient  of  a 
diagonal  element  of  R  in  QR  decomposition 

A.  Yasotharan  ;  DRDC  Ottawa  TM  2010-247 ;  R  &  D  pour  la  defense  Canada  - 
Ottawa ;  decembre  201 0. 

Introduction  :  En  algebre  lineaire,  il  est  bien  connu  que  n’importe  quelle  matrice  A  peut 
etre  decomposee,  ou  factorisee,  puisque  A  =  QR  ou  Q  est  une  matrice  a  colonnes  ortho- 
normales  et  R  est  une  matrice  triangulaire  superieure.  Cette  decomposition  QR  (QRD)  a 
plusieurs  applications.  Dans  les  calculs  matriciels,  elle  permet  de  resoudre  des  equations 
lineaires  et  des  problemes  des  moindres  carres.  Dans  le  traitement  de  signaux,  elle  sert  au 
filtrage  adaptatif,  a  la  mise  en  forme  adaptative  de  faisceaux,  a  la  suppression  de  brouillage 
et  a  la  radiogoniometrie.  Dans  les  communications,  elle  sert  a  legalisation  adaptative  et  a 
la  conception  d’emetteurs  recepteurs  avec  canaux  MIMO  (entrees  multiples,  sorties  mul¬ 
tiples). 

Pour  le  traitement  du  signal  et  les  communications,  il  est  important  de  connartre  les  effets 
de  la  perturbation  sur  le  bruit  de  la  matrice  A  sur  les  elements  de  Q  et  R.  Les  presents 
travaux  constituent  la  premiere  etape  vers  la  realisation  de  cet  objectif. 

Resultats  :  En  general,  la  decomposition  QR  n’est  pas  unique.  Lorsque  A  est  a  rang- 
colonne  complet  M,  on  peut  garantir  l’unicite  en  donnant  M  colonnes  a  la  matrice  Q  et 
en  contraignant  les  elements  diagonaux  de  R,  disons  fq,  r2, . . . ,  a  etre  positifs.  Nous 
considerons  ce  cas. 

En  considerant  r,  comme  une  fonction  des  elements  de  A,  une  expression  simple  est 
derivee  de  son  gradient  matriciel  par  rapport  a  A.  Soit  A  =  X  +  j Y  la  decomposition 
reel/imaginaire  de  A.  Soit  q,  et  s,  les  f  colonnes  de  Q  et  S  =  R  1  respectivement.  On  a 
=  wk^i  +  ,/gy  q  =  r,q,sf ,  qui  est  une  matrice  de  rang  un. 

Les  gradients  matriciels  des  fonctions  de  {/)•}  de  la  forme  /(f,)  et  f(r\ 7w)  sont 
obtenus. 

Portee  :  Lorsque  Eon  veut  calculer  la  decomposition  QR  d’une  matrice  signal  plus  bruit 
A,  il  est  interessant  de  connartre  les  effets  du  bruit  sur  Q  et  R.  A  cette  fin,  les  gradients  des 
elements  Q  et  R  par  rapport  a  A  seront  importants,  comme  vu  dans  la  theorie  de  la  pertur¬ 
bation  stochastique.  Les  presents  travaux  constituent  la  premiere  etape  vers  la  realisation 
de  cet  objectif. 
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Dans  la  conception  d’un  systeme  de  communication  a  decision  retroactive  MIMO,  La 
decomposition  QR  de  la  matrice  composite  du  canal  emetteur  donne  un  recepteur  dont 
le  rendement  est  optimal.  Plus  precisement,  {  rj  }  sont  les  rapports  signal-bruit  des  flux 
de  donnees  paralleles.  Les  expressions  derivees  pour  7^-r,  devraient  done  etre  utiles  pour 
optimiser  conjointement  1’ emetteur  et  le  recepteur. 

Recherches  futures  :  Des  expressions  doivent  etre  derivees  pour  les  gradients  des  elements 
de  Q  et  R,  car  ces  expressions  seront  utiles,  selon  la  theorie  de  perturbation  stochastique, 
pour  evaluer  les  effets  du  bruit  dans  la  decomposition  QR  d’une  matrice  signal  plus  bruit. 

II  serait  bon  d’etablir  la  preuve  que  les  expressions  derivees  pour  f ,  sont  utiles  a  la 
conception  d’un  systeme  MIMO. 


VI 
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1  Introduction 


In  linear  algebra,  it  is  well  known  that  any  matrix  A  can  be  decomposed,  or  factorized, 
as  A  =  QR  where  Q  is  a  matrix  with  orthonormal  columns  and  R  is  an  upper  triangular 
matrix.  A  formal  statement  of  this  result  is  given  in  Section  2.  This  so-called  QR  decom¬ 
position  has  many  applications  in  matrix  computations  -  solving  linear  equations  and  linear 
least  squares  fitting  problems,  to  name  a  couple  [1]. 

The  QR  decomposition  also  has  many  applications  in  signal  processing  and  communica¬ 
tions.  The  signal  processing  applications  include  adaptive  filtering  [2]  (chapters  14,15), 
adaptive  beamforming  [2]  (chapter  14.5)  [3]  (chapter  7),  and  direction-finding  [4].  The 
adaptive  beamforming  methods  that  can  be  efficiently  implemented  via  QR  decomposi¬ 
tion  include  LSE  and  MVDR/MPDR  [3].  These  methods  are  also  applicable  in  interfer¬ 
ence  nulling,  eg.  GPS  anti-jamming.  The  communications  applications  include  adaptive 
equalization  [2]  (chapter  15.9)  and  transceiver  design  for  multiple-input  multiple-output 
(MIMO)  communications  [5]  [6]  [7], 

In  the  above  signal  processing  and  communications  applications,  the  QR  decomposition 
of  a  signal-plus-noise  matrix  is  found.  Therefore,  it  is  of  interest  to  know  how  the  noise 
affects  the  factors  Q  and  R.  If  the  noise  variance  is  small  enough,  the  effect  of  noise  can 
be  studied  via  the  Stochastic  Perturbation  Theory  [8].  A  brief  overview  of  this  theory  is  as 
follows.  Let  A  be  a  matrix  and  let  F  be  a  matrix- valued  function  of  A  with  derivative  FA. 
Given  a  perturbation  matrix  E,  presumed  small,  we  can  write 

F(A  +  E)  «  F(A)  +  FA(E).  (1) 

Suppose  E  is  random  and  of  the  cross-correlated  type  defined  in  [8].  Then  the  difference 

F(A  +  E)-F(A)^Fa(E)  (2) 

can  be  estimated  in  a  stochastic  sense,  provided  we  have  a  tractable  expression  for  the 
derivative  FA(E). 

In  the  context  of  the  QR  decomposition  A  =  QR,  we  want  to  study  how  the  factors  Q 
and  R  are  affected  when  A  is  perturbed  by  noise.  The  work  reported  herein  constitutes 
a  first  step  towards  that  goal.  Treating  the  diagonal  elements  {r,}  of  R  as  functions  of 
A,  we  obtain  a  simple  generic  expression  for  their  gradients  with  respect  to  A;  since  we 
are  dealing  with  scalar-valued  functions,  gradient  and  derivative  are  the  same.  Future  work 
will  aim  to  derive  expressions  for  the  gradients  of  all  elements  of  Q  and  R.  These  gradients 
may  then  be  used  to  study  how  noise  in  matrix  A  affects  it  QRD. 

The  rest  of  the  report  is  organized  as  follows: 

Section  2  provides  the  needed  mathematical  preliminaries:  a  theorem  on  the  QR  decom¬ 
position,  the  definition  of  the  matrix  gradient,  and  notations.  It  also  states  the  scope  of 
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this  report.  Section  3  states  a  result  concerning  the  orthogonal  projection  of  a  vector  onto 
the  column-span  of  a  matrix.  This  result  is  used  in  Section  4  to  derive  a  simple  generic 
expression  for  the  gradients  of  { rf } .  The  main  result  of  the  report  -  a  simple  expression 
for  the  gradients  of  {r,}  -  is  given  in  Section  4  along  with  other  results.  Section  5  outlines 
an  application  of  the  result  to  MIMO  communication  system  design.  Section  6  summarizes 
the  results  and  suggests  some  lines  of  further  work.  Annexes  A  and  B  derive  gradients  that 
are  used  in  Section  3. 
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2  Mathematical  Preliminaries  and  Scope 

2.1  QR  Decomposition 

We  begin  with  a  well  known  result  in  Linear  Algebra.  See  [9]  (Theorem  2.6.1)  and  [10] 
(Section  5.2,  especially,  Section  5.2.6). 

Theorem  1  (QR  decomposition)  Suppose  SA  is  a  K  xM  complex  matrix  with  K>M  and 
rank(SA)  =  M.  Then  SA  can  be  factored  as 

A=QX  (3) 

where  Qis  a  K  x  M  matrix  with  orthonormal  columns,  i.e.,  Qf  Q  =  Im,  the  M  x  M  identity 
matrix,  and  df  is  an  M  x  M  upper  triangular  nonsingular  matrix. 

If  we  insist  that  the  diagonal  elements  of  df,  say  r\ ,  ?2, . . . ,  I'm,  are  positive  real,  then  (fand 
df  are  unique.  □ 

The  above  factorization  of  SA  is  the  so-called  thin  QR  factorization  defined  in  [10]  (text 
preceding  Theorem  5.2.2  in  page  230).  Its  uniqueness  when  df  has  positive  diagonal  el¬ 
ements  is  asserted  by  Theorem  5.2.2  and  Section  5.2.10  of  [10].  In  terms  of  this  unique 
factorization,  any  general  QR  decomposition  will  have  the  form  SA  —  Qdf  where  Q,  =  QJP 
and  df  —  lP*df  for  some  diagonal  matrix  dd  with  diagonal  elements  of  unit  absolute  value. 
Here  *  denotes  complex  conjugation. 

Given  a  matrix,  its  QR  decomposition  can  be  computed  by  several  methods,  e.g.  Gram- 
Schmidt  orthogonalization,  Householder  reflections,  Givens  rotations,  etc  [10]  (Section 
5.2).  The  method  that  is  the  most  relevant  to  the  present  paper  is  the  Gram-Schmidt  or¬ 
thogonalization.  The  properties  of  QR  decomposition  given  in  [10]  (Theorem  5.2.1)  are 
also  very  useful. 


2.2  Matrix  gradient 


Definition  1  (Matrix  gradient)  Let  A  be  a  complex-valued  matrix  and  let  A  =  X  +  j  Y 
be  its  real-imaginary  decomposition.  For  a  real-valued  scalar  function  c  of  A,  we  denote 
by  ^  the  matrix  of  partial  derivatives  of  c  with  respect  to  the  elements  of  A.  Similarly,  we 
denote  by  Jy  the  matrix  of  partial  derivatives  of  c  with  respect  to  the  elements  of  Y.  Thus 
and  jy  are  real-valued  matrices  of  the  same  size  as  A.  We  also  denote 


dc  dc  .  dc 
dA  =  dX+JdY' 


(4) 


We  call  the  matrix  gradient  of  c  with  respect  to  X,  and  similarly  for  Jy .  We  call  Jy  the 
matrix  gradient  ofc  with  respect  to  A.  □ 
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Note  that  when  A  is  a  vector,  the  matrix  gradient  reduces  to  the  vector  gradient. 

We  use  the  term  ‘gradient’  to  refer  to  both  vector  and  matrix  gradients,  as  the  notation  will 
indicate  which  type  of  gradient  is  being  referred  to. 

There  are  many  books  and  papers  that  discuss  the  gradient  of  a  real-valued  scalar  func¬ 
tion  of  a  vector  or  matrix  variable.  A  well  known  paper  is  [11].  A  well  known  book  is 
[12].  A  more  mathematical  treatment  is  given  in  [13],  the  preface  of  which  gives  an  exten¬ 
sive  bibliography  on  the  theory  and  applications  of  matrix  differential  calculus.  Appendix 
A.7  of  [3]  and  the  appendix  of  Chapter  6  of  [14]  give  gradients  of  some  commonly  en¬ 
countered  scalar  functions  of  vectors  and  matrices.  All  of  the  above,  except  [3],  consider 
only  real- valued  scalar  functions  of  real- valued  vector  or  matrix  variables.  The  gradient  of 
a  real- valued  scalar  function  of  a  complex-valued  vector  is  discussed  in  [3]  (A. 7. 4)  with 
acknowledgement  to  [15]. 

2.3  Notation 

For  a  complex  matrix  A,  we  denote  by  ran(A)  the  range  of  A,  ie.  the  vector  subspace 
spanned  by  the  columns  of  A  [10](Section  2.1.2).  The  orthogonal  complement  of  ran(A) 
is  denoted  by  ran-1  (A). 

2.4  Scope  of  Report 

In  the  general  context  of  the  unique  QR  decomposition  of  Theorem  1  above,  first  we  derive 
a  simple  expression  for  From  this,  we  derive  and  more  generally,  for 

any  differentiable  /(.).  As  an  example  of  the  latter,  we  derive  ^  log  f(,  from  which  we 
derive  ^logdet(t^)  which  is  equivalent  to  '  Iogdet(NwJ4).  We  also  treat  the  general 
case  ,  r2, . . . ,  r,\t)  for  /(.)  that  has  all  partial  derivatives. 
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3  Gradient  of  Squared  Norm  of  Orthogonal 
Projection 


Let  A  be  a  K  x  L  {K  >  L)  matrix  with  rank  L  and  let  a  be  a  K  x  1  vector.  In  this  section, 
we  will  derive  a  result  concerning  the  projection  of  a  onto  ran-1  (A).  In  the  next  section, 
we  will  apply  this  result  to  the  QR  dcomposition  of  (3)  by  considering  A  to  be  the  matrix 
of  the  first  L  columns  of  A  and  a  to  be  the  (L+  l)^1  column  of  A,  for  L  =  1,2,...,  (Af  —  1). 

The  orthogonal  projection  matrix  P  onto  ran  (A)  is 

V  —  IK  —  A(Ah  A)-1  Ah  (5) 

which  is  Hermitian  (P7/  =  P)  and  idempotent  (P2  =  P).  Denoting  by  r  the  norm  of  the 
orthogonal  projection  of  a  onto  ran  1  (A),  we  have 

r2  =  ||Pa||2  (6) 

=  awPHPa  (7) 

=  awPa.  (8) 

First  we  will  derive  the  gradients  of  r 2  with  respect  to  a  and  A,  as  expressions  involv¬ 
ing  a  and  A.  Then  we  will  simplify  the  expressions  using  the  QR  decomposition  of  the 

augmented  matrix  [A,  a] . 

We  will  use  the  term  ‘gradient’  for  vector  and  matrix  gradients  when  the  context  is  clear. 
The  gradient  of  f2  with  respect  to  a  is  (see  Annex  A,  Eq.  (A.  1)) 

a  , 

^f2  =  2Pa.  (9) 

oa 

The  gradient  of  r 2  with  respect  to  A  is  given  by  the  following  lemma  whose  statement 
makes  use  of  the  pseudo-inverse  of  A  given  by 

A#  =  (Aw  A)-1AW.  (10) 


Lemma  1 

^-f2  =  -2(l^-A(A//A)'1Aif)aawA(AwA)-1 
=  —2  (Pa)  (A#a)W  . 


(ID 

(12) 
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Proof:  See  Annex  B  for  the  proof  of  (1 1).  Then  use  (5)  and  (10)  to  get  (12).  □ 

Note  that  when  a  e  ran(A),  we  have  7^-f2  —  0.  To  avoid  this,  we  assume  henceforth  that 

a  ran(A).  Then  ^-f2  is  a  rank-one  matrix  since  (Pa)  is  a  column  vector  and  (A#a)W  is 
a  row  vector. 

In  order  to  simplify  (9)  and  (12),  suppose  A  and  [A,  a]  have  the  QR  decompositions 


A  =  QR 

(13) 

and 

[A,  a]  =  QR. 

(14) 

which  are  unique  in  the  sense  of  Theorem  1 . 

Then  Q  and  R  can  be  partitioned  as  [10]  (Theorem  5.2.1) 

Q  =  [Q,q] 

R  [  R  r  ' 

R  -  Of 

(15) 

(16) 

where  q  is  a  K  x  1  vector,  r  is  a  L  x  1  vector  and  r  is  a  real- valued  scalar  which  is  equal  to 
the  norm  of  the  projection  of  a  onto  ran1  (A)  defined  in  (6).  The  latter  fact  can  be  seen  as 
follows.  Eqs.  (14),  (15),  and  (16)  together  show 

a  =  Qr  +  qr  (17) 

which  is  an  orthogonal  decomposition  because 

Q^q  =  0.  (18) 

Moreover,  since  Qr  e  ran(A),  we  have 

qr  =  Pa  (19) 

which  together  with  ||q||  =  1  shows  that  r  is  the  norm  of  the  projection  of  a  onto  ran1  (A). 
Using  ( 19),  the  gradient  of  (9)  can  be  written  simply  as 

^f2  =  2qf.  (20) 

da 


To  simplify  (12),  we  need  some  extra  algebra.  Denote  S  =  R1 .  Then  S  is  upper  triangular 
[10]  (Section  3.1.8,  The  Algebra  of  Triangular  Matrices).  Moreover,  R  ,  which  is  also 
upper  triangular,  can  be  partitioned  as  [9]  (Section  0.7.3) 


R 


t  _ 

R  r 

-1 

l 

W 

c n 

_ i 

0  f 

0  s 

(21) 
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where  s  is  a  Lx  1  vector  and  .visa  scalar  given  by 

RrJr  =  — rs 

/V  A,  1 

rs  —  1. 


(22) 

(23) 


The  following  lemma  gives  a  simple  expression  for  the  matrix  gradient  of  (12). 


Lemma  2 


°  -2  o-2  H 

3Ar  =2rqs  ' 


(24) 


Proof: 

Using  (13),  the  pseudo-inverse  of  (10)  can  be  written  as  A#  =  R_1QW.  Using  this  and 
(17),  we  get 

A#a  =  R  1  Qw  ( Qr  +  qr)  (25) 

=  R  (26) 

=  -rs.  (27) 

where  in  the  last  step  we  have  used  (22).  Using  (27)  and  (19)  in  (12),  we  get  (24).  □ 

By  combining  (24)  and  (20),  we  obtain  the  following  result. 

Lemma  3 

5^  =  2*^.  (28) 

Proof:  Using  (23),  (20)  can  be  written  as  J^r2  =  2f2q.s\  Combine  this  with  (24)  □ 

Note  that  [s H  ,s]  is  the  Hermitian  transpose  of  the  last  column  of  R  1  whose  partitioned 
form  is  given  by  (21). 
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4  The  Main  Result 


Referring  to  Theorem  1  and  Eq.  (3),  we  first  introduce  the  notation  needed  for  stating  our 
main  result.  Denote  by  q,  the  column  of  Q.  Denote  S  =  1  and  denote  by  s,-  the  i^1 

column  of  S ■  Thus 


0,=  (qi>q2,  ■•■,<!*#)  (29) 

^  1  =  5  =  (s1,s2,...,sM).  (30) 

Note  that  S  is  also  upper  triangular  [10]  (Section  3.1.8).  Recall  that  {/,}  are  the  diagonal 
elements  of  df. 

The  following  theorem  is  the  main  result  of  this  paper. 


Theorem  2  Referring  to  (3)  and  the  above  notations, 


for  i  —  1,2 


(31) 

□ 


The  proof  rests  on  the  fact  that  QR  decomposition  can  be  done  via  Gram-Schmidt  orthog- 
onalization.  Denote  by  a,  the  z^1  column  of  A  of  (3).  Denote  A,  =  [ai,a2, •  •  •  ,a,-]  for 
i  —  1, 2, . . .  ,M.  Then  ai  =  qin  and,  for  i  >  1,  q,f,  is  the  orthogonal  projection  of  a,  onto 
rarrL(A,  |).  Evidently,  rf  does  not  depend  on  [a, ,  | ,  a,+2, . . . ,  a^],  and  we  have 


0  Kx(M-i) 


(32) 


Denote  by  {£,■:/=  1,2,...  ,M}  the  diagonal  elements  of  S.  Then  (cf.  (23)) 

sfi  =1  for  /  =  1 , 2, . . . ,  M.  (33) 

The  cases  i  —  1  and  i  >  1  are  separately  treated  below. 

Proof  for  Case  i  =  1 

Observe  that 

r\  —  afai  (34) 

ai  =  qiri.  (35) 
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Therefore, 


3 

3A| 


ft 


3^(a?ai) 

2ai 

2qi  H 

2qiffsi 


(36) 

(37) 

(38) 

(39) 


where  (37)  follows  from  (A.l)  of  Annex  A  and  in  the  last  step  we  have  used  (33).  Com¬ 
bining  this  with  (32),  we  get 

=  2qif?[fi,0lx(M_1)]  (40) 

=  2qir?sf  (41) 

where  in  the  last  step  we  have  used  the  fact  that  S  of  (30)  is  upper  triangular.  □ 


Proof  for  Case  i  >  1 


Denote  by  R,  the  leading  i  x  i  submatrix  of  A.,  the  upper  triangular  matrix  of  (3).  Define 
the  partition 


R( 


R/-1  D 

o  n 


(42) 


for  i  =  2, 3, . . . , M,  where  we  identify  R]  —  r\.  Denote  by  S,  the  leading  i  x  i  submatrix  of 
S,  the  upper  triangular  matrix  of  (30).  Define  the  partition 


S/ 


Si-1 

0  % 


(43) 


for  i  =  2,3,. . .  ,M,  where  we  identify  Si  =  s\.  Then  S,R7-  =  I,-  for  i  =  1,2, . . .  ,M  (cf.  (21)). 
Denote  Q;  =  [qi, q2, . . . , q,]  for  i  =  1,2 


Since  r,  is  the  norm  of  the  orthogonal  projection  of  a,  onto  ran_L(A7_i),  we  invoke  Lemma 
3  of  Section  3  by  setting  A  =  A(-_i,  a  =  a,-,  Q  =  Q7_i,  q  =  q7-,  R  =  R,_i,  r  =  r7-,  r  =  ru 
S  =  S/_i,  s  =  Si,  s  =  Si,  to  get 


9  .2 

- r 

3A7  ! 


u  p2 

3[A;_i,a7-] 

2q iff  [§f  ,§i\  . 


(44) 

(45) 


Observe  that  [sf  ,f7]  is  the  ft1 


row  of  S^.  Combining  this  with  (32),  we  get 


2q iff  [sf,57,olx(M_7-)] 

2q in  s7- 


(46) 

(47) 
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where  in  the  last  step  we  have  used  the  fact  that  S  of  (30)  is  upper  triangular. 


□ 


The  following  corollary  gives  the  gradient  of  r,-.  This  result  is  the  subject  of  the  title  of  this 
paper. 


Corollary  1 

d  *  -  H 

Un  miSi  ' 

Proof:  By  the  chain  rule  of  differentiation, 

Mfl  =  (4?)*)  (aT‘2) 


(48) 


(49) 

(50) 

(51) 

□ 


More  generally,  the  gradient  of  any  differentiable  function  of  f,  is  given  by  the  following 
corollary. 


Corollary  2  Let  the  real-valued  function  f(x)  be  differentiable  in  (0,°°)  and  let  f  (x)  be 
its  derivative.  Then 

^/(o)  =  .  (52) 

□ 


As  an  example,  we  have 

^logr,-  =  q;s".  (53) 

Weighted  linear  combinations  over  /  =  1,2 ,M  can  be  taken  of  the  above  gradients.  Let 
(9/ :  i  —  1, 2, . . .  ,M}  be  a  set  of  weights,  and  denote  the  diagonal  matrix 

0  =  diag(0i,02,...,0A#).  (54) 


Denote  the  diagonal  matrix 


R  =  diag(fi,f2,...,fM)  =  diag(j^). 


(55) 
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By  taking  weighted  sums  of  (31),  we  get 

M 


M 


=  ^20(/fq(sf 


J=  1 


1=1 


-  2Q,0R2^. 

Similarly,  by  taking  weighted  sums  of  (48),  we  get 


M 

^  (  X>f,- I  =  Q0RJ 


H 


By  directly  summing  (53)  over  i  =  1, 2, ...  ,M,  we  get 


(56) 

(57) 

(58) 


(59) 


r)  d 

^logdetW  -  ^log^  (60> 

i=i 

--  M 

=  ai£logf>'  <61) 

i=l 

=  QJH.  (62) 

This  can  also  be  written  as 

^logdet(.TwJ4)  =  2QjSh  (63) 

since  and  dct(J4wJ4)  =  dct(7(J2. 

For  a  square  ( K  =  M)  and  nonsingular  A,  (62)  reduces  to 


^■log |det(J?)|  =  A  H. 


(64) 


More  generally,  let  f(r i ,  fS,  ■  ■  ■ ,  ?m)  be  a  real- valued  function,  and  denote 

-  /  df  df  df 

F  =  diaS  Uf 


ori  ar2  dr, 


M 


(65) 


Then,  by  the  chain  rule  of  differentiation, 


d_ 

dA 


f(rhr2,...,rM) 


^df  d  . 

^dr,dJZr' 

i=  1 

(66) 

Z=1  ' 

qfksh. 

(67) 

(68) 
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4.1  The  Real  Case 


When  the  given  matrix  A  of  (3)  is  real,  its  unique  QR  decomposition  A  =  QjK,  yields  real 
matrices  Q,  and  .  Therefore,  the  above  expressions  for  gradients  can  be  used  with  the 
Hermitian  transpose  being  interpreted  as  the  normal  transpose.  Thus 


r/q/s  i 


(69) 
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5  Application  to  MIMO  Communications 


In  multiple-input-multiple-output  (MIMO)  decision-feedback  (DF)  communication  system 
theory,  the  optimum  receiver  for  a  given  transmitter  and  channel  can  be  described  easily  via 
the  QR  decomposition  of  the  transmitter-channel  composite  matrix  [6]  [5]  [7],  Moreover, 
when  the  optimum  receiver  is  used,  the  values  {  rj  }  represent  the  Signal-to-Noise  Ratios 
(SNRs)  for  the  parallel  data  streams  that  are  being  communicated. 

Problems  of  optimizing  the  transmitter-receiver  pair  of  a  MIMO  DF  system  generally  fall 
into  two  categories: 

1.  maximize  performance  as  measured  by  a  function  f(r\,  r2, . . . ,  r m)  subject  to  a  con¬ 
straint  on  the  transmitter  power 

2.  minimize  the  transmitter  power  subject  to  constraints  on  {r,}  or  functions  thereof. 

To  derive  first-order  optimality  conditions  for  these  problems,  we  need  an  expression  for 
the  matrix  gradient  of  f{r\ ,  ?2,  ■  ■  ■ ,  tm)  or  {r,}  with  respect  to  the  elements  of  the  transmitter 
matrix.  Such  an  expression  can  be  obtained  from  (48)  and  the  chain  rule  of  differentiation; 
note  that  (48)  gives  the  gradient  w.r.t.  the  transmitter-channel  composite  matrix. 

In  fact,  an  expression  for  the  matrix  gradient  of  a  weighted  sum  of  the  SNRs  was  derived 
in  [7]  (Appendix  A)  and  used  to  optimize  the  transmitter  and  Zero-Forcing  (ZF)  receiver 
of  a  MIMO  DF  system.  That  derivation  directly  deals  with  the  QR  decomposition  of  the 
transmitter-channel  composite  matrix,  and  therefore  is  complicated.  Moreover,  it  hides 
the  general  results  that  would  apply  in  other  situations,  eg.  a  MIMO  DF  system  with  a 
Minimum  Mean  Square  Error  (MMSE)  receiver.  The  derivations  of  the  present  report  are 
not  only  much  simpler  but  also  expose  the  general  results.  Using  the  results  presented 
herein,  gradients  of  general  performance  measures,  w.r.t.  the  transmitter  matrix,  can  be 
derived  easily  for  both  ZF  and  MMSE  receivers. 
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6  Conclusion 


6.1  Summary 

For  a  full-column-rank  matrix  J?,  we  considered  its  unique  QR  decomposition  J?  =  QX, 
such  that  Qj*  Q  =  I  and  is  square  upper  triangular  with  positive  diagonal  elements,  say 

Treating  f,  as  a  function  of  the  elements  of  J?,  we  derived  a  simple  expression  for  the  matrix 
gradient  of  rf  (Theorem  2).  We  then  combined  this  with  the  chain  rule  of  differentiation 
to  obtain  simple  expressions  for  the  matrix  gradients  of  r,  (Corollary  1)  and  a  general 
differentiable  real-valued  function  /(?,•)  (Corollary  2).  As  an  example  of  the  latter,  we 
derived  the  gradient  of  logf,  (53).  Using  this  we  derived  the  gradient  of  logdet (AH J%) 
(63).  By  combining  Corollary  1  with  the  chain  rule,  we  also  derived  the  gradient  of  a 
general  real- valued  differentiable  function  f(r i ,  r2, . . . ,  rM)  (68). 

We  noted  how  the  main  result  of  this  report  may  be  used  to  optimize  the  transmitter-receiver 
pair  of  a  MIMO  DF  communication  system. 

6.2  Suggestions  for  Future  Work 

Expressions  must  be  derived  for  the  gradients  of  all  elements  of  Q  and  %  ,  as  these  will 
be  useful  in  evaluating  the  effects  of  noise  in  J?,  according  to  the  stochastic  perturbation 
theory. 

It  would  be  worthwhile  to  demonstrate  the  utility  of  the  results  of  this  report  in  optimizing 
the  transmitter-receiver  pair  of  a  MIMO  DF  communication  system  under  various  criteria. 
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Annex  A:  Gradient  of  a  Quadratic  Form 


For  a  complex- valued  vector  a  and  a  Hermitian- symmetric  matrix  B,  consider  the  real¬ 
valued  quadratic  form  a^Ba.  Here  we  derive  the  gradient 

(awBa)  =  2Ba.  (A.l) 

da  v  ' 

Although  this  is  a  simple  result,  its  derivation  will  be  a  good  introduction  to  the  somewhat 
complicated  derivations  in  Annex  B.  This  result  is  used  to  get  (9)  and  (37). 


For  any  generic  real- valued  scalar  variable  a,  which  can  be  the  real  or  imaginary  part  of 
any  element  of  a,  we  can  write,  by  the  product  rule  of  differentiation, 


_d_ 

da 


Ba  +  aaB 


(A. 2) 


Let  a  =  x  +  jy  be  the  real-imaginary  decomposition  of  a.  Let  xn  be  the  rfo  element  of  x 
and  similarly  for  yn.  For  a  general  vector  z,  we  denote  by  z\n  the  rfi1  element  of  z. 

We  shall  use  the  following  facts  below.  For  a —  xn,  ^  is  a  vector  that  has  1  at  position  n 
and  zeros  elsewhere.  Similarly,  for  a  —  yn,  ^  is  a  vector  that  has  j  at  position  n  and  zeros 
elsewhere. 


Setting  a  =  xn,  we  have  by  (A.2) 

d 


st  (a"Ba> 


Letting  a  vary  over  x,  we  have 


(Ba)|H+(Ba)|; 
291  (Ba)|„. 


_a_ 

dx 


(awBa) 


29t  (Ba) . 


Setting  a  —  yn,  we  have  by  (A.2) 

- —  (af/Ba) 
dyn 

Letting  a  vary  over  y,  we  have 


~j  (Ba)|n  +  j  (Ba)|* 
23  (Ba)|n . 


_d^ 

3y 


(aHBa) 


23  (Ba). 


(A. 3) 
(A.4) 

(A. 5) 


(A. 6) 
(A. 7) 

(A. 8) 


By  combining  (A. 5)  and  (A.8)  according  to  Definition  1  of  Section  2,  we  get  (A.l). 
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Annex  B:  Proof  of  Lemma  1  of  Section  3 


This  annex  proves  (1 1)  of  Lemma  1 . 
From  (5)  and  (8),  we  have 


f2  =  aHa-aHA{AHA)-lAHa. 


(B.l) 


The  gradient,  w.r.t.  A,  of  the  first  term  of  (B.l)  is  zero.  The  gradient  of  the  second  term 
of  (B.l)  can  be  obtained  by  the  product  rule  of  differentiation  as  follows.  For  any  generic 
real- valued  scalar  variable  a,  which  can  be  the  real  or  imaginary  part  of  any  element  of  A, 
we  can  write 

(aH A{AH A)-1  AH a)  =  di(a)  +d2(a)  +  d3{a)  (B.2) 

C/UC 

where 


d  i(a) 

d2  (a) 
d3(a) 


(d  A 


) 


h  A  (d{AH A)  l\  .H 


a  A 


aHA(AHA) 


da 


AHa 


■©* 


(B.3) 

(B.4) 

(B.5) 


Let  A  =  X  +  j Y  be  the  real-imaginary  decomposition  of  A.  In  the  following  subsections, 
we  shall  evaluate  di(a),  d2(a),  and  d3( a)  when  a  varies  over  X  and  Y.  Towards  this,  we 
denote 


b  =  (ANA)~1AHa  (B.6) 

c  =  Ab.  (B.7) 


We  shall  use  the  following  facts  below.  For  a  =  xm  n,  ^  is  a  matrix  that  has  1  at  position 
(m,n)  and  zeros  elsewhere.  Similarly,  for  a  =  ym.„,  is  a  matrix  that  has  j  at  position 
(m,  n)  and  zeros  elsewhere. 


Notation 

For  a  vector  z,  we  denote  by  z\n  the  rfi1  element  of  z. 

For  a  matrix  Z,  we  denote  by  Z\m  n  the  (m,  n)^1  element  of  Z. 
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di  over  X  and  Y 

Using  (B.6)  in  (B.3),  we  have 


rfl(a)  =  a"(^)b' 


Setting  a  =  xm.n,  we  have 


cl\  (Xm,n) 


* 

m.n  * 


Setting  a  =  ymM,  we  have 


d\(ym,n)  — 


j  alm  b|w 


=  j{  ab 


H  \ 


d3  over  X  and  Y 


Using  (B.6)  in  (B.5),  we  have 

*(a) = b"  (isr)  a 


Setting  a  =  xmA,  we  have 


d3  ( Xm,n  ) 


Setting  a  =  ym.n,  we  have 

d3  ( ym,n  ) 


d\  +  d3  over  X  and  Y 

Combining  (B.10)  and  (B.15),  we  get 

d\  (xW;„)  +  d3 (xm,n)  =  (at/*) \*m  n  +  (abH) 

=  2SR  ( (ab")  |m  „)  . 


(B.8) 


(B.9) 

(B.10) 


(B  -11) 
(B.12) 


(B.13) 


(B.14) 

(B.15) 


(B.16) 

(B.17) 


(B.18) 

(B.19) 
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Combining  (B.12)  and  (B.17),  we  get 


d\  ( ym,n )  T  d2  ( ym,n ) 


23((ab«)|,„). 


(B.20) 

(B.21) 


Simplification  of  d2{o c) 


Suppose  Z(a)  is  a  non-singular  matrix-valued  function  of  the  real-valued  scalar  a.  Then, 
by  applying  the  product  rule  of  differentiation  to  the  identity 


Z(a)Z^!(a)  =  I 

and  rearranging  the  terms,  we  get  the  well  known  result 

az-1  ,,  ,  /az\  ,,  , 

=-z~  (a)Ur  (a)- 


3a 


(B.22) 


(B.23) 


Setting  Z  =  (AWA),  we  get 

a(A^A)-1 

da 


(A„Arl  (3(A«A))  (A„Ar, 


By  applying  the  product  rule  to  the  middle  term,  we  get 


a(A/jA)-1 

3a 


h  *  \-l 


=  -(A  A) 


/'dAH\ 
V  3a  ) 


A  +  Ah  (  ^  ] 


(-) 

\daj 


H  A  \-l 


(A  A) 


(B.24) 


(B.25) 


Using  the  above  in  (B.4),  we  get 


</2(cx)  =  -a"A(A"A)-‘|^a|0 


'  A  +  A"  |  ^  | 


3A\ 
3a  ) 


{AHA)~lAHa.  (B.26) 


Using  (B.6)  in  the  above,  we  get 

d2{a)  —  —bH 


fdAH\ 

V~acT  ) 


A  +  Aw[  ^  ] 


3A\ 
da  ) 


(B.27) 


Using  (B.7)  in  the  above,  and  changing  sign,  we  get 


-d2( a)  =  bH  )  c  +  cH  (  ^  )  b. 


dAH 


\da 


(B.28) 
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d2  over  X  and  Y 

Note  that  the  first  term  of  the  right-hand  side  of  (B.28)  is  similar  to  ch-,  (a)  of  (B.  13)  and  the 
second  term  is  similar  to  d,\ (a)  of  (B.8). 

Setting  a  =  xm.n,  we  have 

-Mxm.n)  =  (cbff)|m)B+(cbfl)|*)n  (B.29) 

=  29i((cb«)| . ).  (B.30) 

Setting  a  —  ym_n,  we  have 

-di(ym,n)  =  -UcbH)\mn  +  j(cbH)\*mn  (B.31) 

=  23((cb«)| )•  (B.32) 

di  +  d2  +  d2  over  X  and  Y 

Combining  (B.19)  and  (B.30),  we  get 

dl(xm,n)  +  d2(xm^)  +  d3(xmt„)  =  29i  ((abH)|ra||) -29t  ((cbH)|m|i)  (B.33) 

=  29i((ab«-cb«)|„)  (B.34) 

=  2«(((a—  c)b»)|mJ.  (B.35) 


Combining  (B.21)  and  (B.32),  we  get 

dl{ym,n)  +  d2(ym,„)  +  d3{ym^)  =  23  ((abH)|mji) -23  ((cb")|mj|)  (B.36) 

=  23((ab«-cb«)|,„)  (B.37) 

=  23(((a-c)b»)|mJ.  (B.38) 

Gradient  of  Second  Term  of  (B.1) 

From  (B.2)  and  (B.35),  we  get 

dL  (aH  A(AH  A)~l  AH  a)  =  29t  ((a- c)bw) .  (B.39) 

oX  v  /  \  / 
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From  (B.2)  and  (B.38),  we  get 


^  (aHA(A*A)_1AHa)  =  23  ((a- c)bw) .  (B.40) 

By  combining  the  above,  according  to  Definition  1  of  Section  2,  we  write 

(aH A(AH A)-1  AH a)  =2(a-c)bH.  (B.41) 

a  A  v  7 

Expansion  of  (a-c)b^ 

By  using  (B.7)  and  (B.6),  we  can  expand  (a  —  c)bH  in  terms  of  a  and  A  as  follows: 

(a-c)bH  =  (a-Ab)b"  (B.42) 

=  (a-A{AHA)~1AHa)hH  (B.43) 

=  (IK-  A{AHA)~lAH)abH  (B.44) 

=  (IK-A(AHA)-1AH)ciaHA{AHA)-1.  (B.45) 

Final  Results 

From  (B.l),  (B.41),  and  (B.45),  we  get 

^-f2  =  -A  (aH A(AH A)-1  AH a)  (B.46) 

=  — 2(a  —  c)b//  (B.47) 

=  -2(l^-A(A//A)'1A/f)aawA(AwA)-1  (B.48) 

which  is  (11)  of  Lemma  1. 
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